Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction

نویسندگان

  • David Laehnemann
  • Arndt Borkhardt
  • Alice Carolyn McHardy
چکیده

Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DUDE-Seq: Fast, flexible, and robust denoising for targeted amplicon sequencing

We consider the correction of errors from nucleotide sequences produced by next-generation targeted amplicon sequencing. The next-generation sequencing (NGS) platforms can provide a great deal of sequencing data thanks to their high throughput, but the associated error rates often tend to be high. Denoising in high-throughput sequencing has thus become a crucial process for boosting the reliabi...

متن کامل

DUDE-Seq: Fast, flexible, and robust denoising of nucleotide sequences

Motivation: We consider the correction of errors from nucleotide sequences produced by next-generation sequencing. The error rate in reads has been increasing with the shift of focus of mainstream sequencers from accuracy to throughput. Denoising in high-throughput sequencing is thus becoming a crucial component for boosting the reliability of downstream analyses. Results: Our methodology, name...

متن کامل

مروری برتکنیک های توالی یابی D‏NA (نسل اول، نسل دوم و نسل سوم)

Introduction: The DNA sequencing is the most important technique in molecular biology by which the order of the nucleotides can be identified in a piece of DNA. There are several different methods for sequencing the DNA. Now, the DNA sequencing has great importance in the medical diagnostics and other medical fields. Some methods have been invented to speed up and increase the efficiency of the...

متن کامل

BLESS: Bloom filter-based error correction solution for high-throughput sequencing reads

MOTIVATION Rapid advances in next-generation sequencing (NGS) technology have led to exponential increase in the amount of genomic information. However, NGS reads contain far more errors than data from traditional sequencing methods, and downstream genomic analysis results can be improved by correcting the errors. Unfortunately, all the previous error correction methods required a large amount ...

متن کامل

Strategies and Clinical Applications of Next Generation Sequencing

Abstract DNA sequencing is one of the great valuable techniques in molecular biology, which can be used to detect the sequence of nucleotides in a DNA fragment. The high-throughput se­quencing known as Next Generation Sequencing (NGS) revolutionized genomic research and molecular biology; therefore, the whole human genome can be sequenced with a low cost in several days. NGS technology is simi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 17  شماره 

صفحات  -

تاریخ انتشار 2016